Overview

Dataset statistics

Number of variables 7
Number of observations 588101
Missing cells 0
Missing cells (%) 0.0%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 27.5 MiB
Average record size in memory 49.0 B

Variable types

Numeric 4
Categorical 2
Boolean 1

Alerts

Unnamed: 0 is highly overall correlated with total ads High correlation
test group is highly overall correlated with user id High correlation
total ads is highly overall correlated with Unnamed: 0 High correlation
user id is highly overall correlated with test group High correlation
test group is highly imbalanced (75.8%) Imbalance
converted is highly imbalanced (83.0%) Imbalance
Unnamed: 0 is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
user id has unique values Unique

Reproduction

Analysis started 2025-11-21 17:13:32.424676
Analysis finished 2025-11-21 17:13:42.892257
Duration 10.47 seconds
Software version ydata-profiling vv4.18.0
Download configuration config.json

Variables

Unnamed: 0
Real number (ℝ)

High correlation  Uniform  Unique 

Distinct 588101
Distinct (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 294050
Minimum 0
Maximum 588100
Zeros 1
Zeros (%) < 0.1%
Negative 0
Negative (%) 0.0%
Memory size 4.5 MiB
2025-11-21T14:13:43.088955 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 29405
Q1 147025
median 294050
Q3 441075
95-th percentile 558695
Maximum 588100
Range 588100
Interquartile range (IQR) 294050

Descriptive statistics

Standard deviation 169770.28
Coefficient of variation (CV) 0.57735174
Kurtosis -1.2
Mean 294050
Median Absolute Deviation (MAD) 147025
Skewness 0
Sum 1.729311 × 1011
Variance 2.8821948 × 1010
Monotonicity Strictly increasing
2025-11-21T14:13:43.207248 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 1
 
< 0.1%
1 1
 
< 0.1%
2 1
 
< 0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%
5 1
 
< 0.1%
6 1
 
< 0.1%
7 1
 
< 0.1%
8 1
 
< 0.1%
9 1
 
< 0.1%
Other values (588091) 588091
> 99.9%
Value Count Frequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
Value Count Frequency (%)
588100 1
< 0.1%
588099 1
< 0.1%
588098 1
< 0.1%
588097 1
< 0.1%
588096 1
< 0.1%
588095 1
< 0.1%
588094 1
< 0.1%
588093 1
< 0.1%
588092 1
< 0.1%
588091 1
< 0.1%

user id
Real number (ℝ)

High correlation  Unique 

Distinct 588101
Distinct (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 1310692.2
Minimum 900000
Maximum 1654483
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 4.5 MiB
2025-11-21T14:13:43.318557 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum 900000
5-th percentile 1006830
Q1 1143190
median 1313725
Q3 1484088
95-th percentile 1620397
Maximum 1654483
Range 754483
Interquartile range (IQR) 340898

Descriptive statistics

Standard deviation 202225.98
Coefficient of variation (CV) 0.15428945
Kurtosis -1.0433138
Mean 1310692.2
Median Absolute Deviation (MAD) 170453
Skewness -0.10035776
Sum 7.708194 × 1011
Variance 4.0895348 × 1010
Monotonicity Not monotonic
2025-11-21T14:13:43.434125 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1069124 1
 
< 0.1%
1119715 1
 
< 0.1%
1144181 1
 
< 0.1%
1435133 1
 
< 0.1%
1015700 1
 
< 0.1%
1137664 1
 
< 0.1%
1116205 1
 
< 0.1%
1496843 1
 
< 0.1%
1448851 1
 
< 0.1%
1446284 1
 
< 0.1%
Other values (588091) 588091
> 99.9%
Value Count Frequency (%)
900000 1
< 0.1%
900001 1
< 0.1%
900002 1
< 0.1%
900003 1
< 0.1%
900004 1
< 0.1%
900005 1
< 0.1%
900006 1
< 0.1%
900007 1
< 0.1%
900008 1
< 0.1%
900009 1
< 0.1%
Value Count Frequency (%)
1654483 1
< 0.1%
1654482 1
< 0.1%
1654480 1
< 0.1%
1654478 1
< 0.1%
1654477 1
< 0.1%
1654476 1
< 0.1%
1654475 1
< 0.1%
1654473 1
< 0.1%
1654472 1
< 0.1%
1654471 1
< 0.1%

test group
Categorical

High correlation  Imbalance 

Distinct 2
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 4.5 MiB
ad
564577 
psa
 
23524

Length

Max length 3
Median length 2
Mean length 2.0399999
Min length 2

Characters and Unicode

Total characters 1199726
Distinct characters 4
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row ad
2nd row ad
3rd row ad
4th row ad
5th row ad

Common Values

Value Count Frequency (%)
ad 564577
96.0%
psa 23524
 
4.0%

Length

2025-11-21T14:13:43.537981 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-21T14:13:43.608462 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Value Count Frequency (%)
ad 564577
96.0%
psa 23524
 
4.0%

Most occurring characters

Value Count Frequency (%)
a 588101
49.0%
d 564577
47.1%
p 23524
 
2.0%
s 23524
 
2.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 1199726
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
a 588101
49.0%
d 564577
47.1%
p 23524
 
2.0%
s 23524
 
2.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1199726
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
a 588101
49.0%
d 564577
47.1%
p 23524
 
2.0%
s 23524
 
2.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1199726
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
a 588101
49.0%
d 564577
47.1%
p 23524
 
2.0%
s 23524
 
2.0%

converted
Boolean

Imbalance 

Distinct 2
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 574.4 KiB
False
573258 
True
 
14843
Value Count Frequency (%)
False 573258
97.5%
True 14843
 
2.5%
2025-11-21T14:13:43.647655 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

total ads
Real number (ℝ)

High correlation 

Distinct 807
Distinct (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 24.820876
Minimum 1
Maximum 2065
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 4.5 MiB
2025-11-21T14:13:43.723645 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 1
Q1 4
median 13
Q3 27
95-th percentile 88
Maximum 2065
Range 2064
Interquartile range (IQR) 23

Descriptive statistics

Standard deviation 43.715181
Coefficient of variation (CV) 1.7612263
Kurtosis 109.91798
Mean 24.820876
Median Absolute Deviation (MAD) 10
Skewness 7.433113
Sum 14597182
Variance 1911.017
Monotonicity Not monotonic
2025-11-21T14:13:43.824958 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1 56606
 
9.6%
2 39827
 
6.8%
5 29303
 
5.0%
3 28661
 
4.9%
4 23426
 
4.0%
6 23409
 
4.0%
7 19095
 
3.2%
15 19031
 
3.2%
8 16037
 
2.7%
12 15154
 
2.6%
Other values (797) 317552
54.0%
Value Count Frequency (%)
1 56606
9.6%
2 39827
6.8%
3 28661
4.9%
4 23426
4.0%
5 29303
5.0%
6 23409
4.0%
7 19095
 
3.2%
8 16037
 
2.7%
9 12546
 
2.1%
10 11865
 
2.0%
Value Count Frequency (%)
2065 1
< 0.1%
1778 1
< 0.1%
1680 1
< 0.1%
1632 1
< 0.1%
1491 1
< 0.1%
1414 1
< 0.1%
1398 1
< 0.1%
1391 1
< 0.1%
1354 1
< 0.1%
1328 1
< 0.1%

most ads day
Categorical

Distinct 7
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 4.5 MiB
Friday
92608 
Monday
87073 
Sunday
85391 
Thursday
82982 
Saturday
81660 
Other values (2)
158387 

Length

Max length 9
Median length 8
Mean length 7.10438
Min length 6

Characters and Unicode

Total characters 4178093
Distinct characters 17
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Monday
2nd row Tuesday
3rd row Tuesday
4th row Tuesday
5th row Friday

Common Values

Value Count Frequency (%)
Friday 92608
15.7%
Monday 87073
14.8%
Sunday 85391
14.5%
Thursday 82982
14.1%
Saturday 81660
13.9%
Wednesday 80908
13.8%
Tuesday 77479
13.2%

Length

2025-11-21T14:13:43.921463 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-21T14:13:44.000926 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Value Count Frequency (%)
friday 92608
15.7%
monday 87073
14.8%
sunday 85391
14.5%
thursday 82982
14.1%
saturday 81660
13.9%
wednesday 80908
13.8%
tuesday 77479
13.2%

Most occurring characters

Value Count Frequency (%)
a 669761
16.0%
d 669009
16.0%
y 588101
14.1%
u 327512
7.8%
r 257250
 
6.2%
n 253372
 
6.1%
s 241369
 
5.8%
e 239295
 
5.7%
S 167051
 
4.0%
T 160461
 
3.8%
Other values (7) 604912
14.5%

Most occurring categories

Value Count Frequency (%)
(unknown) 4178093
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
a 669761
16.0%
d 669009
16.0%
y 588101
14.1%
u 327512
7.8%
r 257250
 
6.2%
n 253372
 
6.1%
s 241369
 
5.8%
e 239295
 
5.7%
S 167051
 
4.0%
T 160461
 
3.8%
Other values (7) 604912
14.5%

Most occurring scripts

Value Count Frequency (%)
(unknown) 4178093
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
a 669761
16.0%
d 669009
16.0%
y 588101
14.1%
u 327512
7.8%
r 257250
 
6.2%
n 253372
 
6.1%
s 241369
 
5.8%
e 239295
 
5.7%
S 167051
 
4.0%
T 160461
 
3.8%
Other values (7) 604912
14.5%

Most occurring blocks

Value Count Frequency (%)
(unknown) 4178093
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
a 669761
16.0%
d 669009
16.0%
y 588101
14.1%
u 327512
7.8%
r 257250
 
6.2%
n 253372
 
6.1%
s 241369
 
5.8%
e 239295
 
5.7%
S 167051
 
4.0%
T 160461
 
3.8%
Other values (7) 604912
14.5%

most ads hour
Real number (ℝ)

Distinct 24
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 14.469061
Minimum 0
Maximum 23
Zeros 5536
Zeros (%) 0.9%
Negative 0
Negative (%) 0.0%
Memory size 4.5 MiB
2025-11-21T14:13:44.111137 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 8
Q1 11
median 14
Q3 18
95-th percentile 22
Maximum 23
Range 23
Interquartile range (IQR) 7

Descriptive statistics

Standard deviation 4.8346339
Coefficient of variation (CV) 0.33413599
Kurtosis 0.10323692
Mean 14.469061
Median Absolute Deviation (MAD) 3
Skewness -0.33697157
Sum 8509269
Variance 23.373685
Monotonicity Not monotonic
2025-11-21T14:13:44.189201 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
Value Count Frequency (%)
13 47655
 
8.1%
12 47298
 
8.0%
11 46210
 
7.9%
14 45648
 
7.8%
15 44683
 
7.6%
10 38939
 
6.6%
16 37567
 
6.4%
17 34988
 
5.9%
18 32323
 
5.5%
9 31004
 
5.3%
Other values (14) 181786
30.9%
Value Count Frequency (%)
0 5536
 
0.9%
1 4802
 
0.8%
2 5333
 
0.9%
3 2679
 
0.5%
4 722
 
0.1%
5 765
 
0.1%
6 2068
 
0.4%
7 6405
 
1.1%
8 17627
3.0%
9 31004
5.3%
Value Count Frequency (%)
23 20166
3.4%
22 26432
4.5%
21 29976
5.1%
20 28923
4.9%
19 30352
5.2%
18 32323
5.5%
17 34988
5.9%
16 37567
6.4%
15 44683
7.6%
14 45648
7.8%

Interactions

2025-11-21T14:13:41.114444 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:38.503216 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:39.418904 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:40.292738 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:41.336930 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:38.742688 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:39.638345 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:40.500881 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:41.563416 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:38.977470 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:39.853413 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:40.709754 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:41.777019 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:39.192194 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:40.080503 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
2025-11-21T14:13:40.906098 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/

Correlations

2025-11-21T14:13:44.261243 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Unnamed: 0 converted most ads day most ads hour test group total ads user id
Unnamed: 0 1.000 0.109 0.101 -0.000 0.123 -0.508 -0.038
converted 0.109 1.000 0.026 0.025 0.009 0.081 0.012
most ads day 0.101 0.026 1.000 0.029 0.020 0.008 0.032
most ads hour -0.000 0.025 0.029 1.000 0.015 -0.003 -0.040
test group 0.123 0.009 0.020 0.015 1.000 0.000 1.000
total ads -0.508 0.081 0.008 -0.003 0.000 1.000 0.001
user id -0.038 0.012 0.032 -0.040 1.000 0.001 1.000

Missing values

2025-11-21T14:13:41.934340 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-11-21T14:13:42.258007 image/svg+xml Matplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Unnamed: 0 user id test group converted total ads most ads day most ads hour
0 0 1069124 ad False 130 Monday 20
1 1 1119715 ad False 93 Tuesday 22
2 2 1144181 ad False 21 Tuesday 18
3 3 1435133 ad False 355 Tuesday 10
4 4 1015700 ad False 276 Friday 14
5 5 1137664 ad False 734 Saturday 10
6 6 1116205 ad False 264 Wednesday 13
7 7 1496843 ad False 17 Sunday 18
8 8 1448851 ad False 21 Tuesday 19
9 9 1446284 ad False 142 Monday 14
Unnamed: 0 user id test group converted total ads most ads day most ads hour
588091 588091 1002731 ad False 2 Tuesday 23
588092 588092 1245606 ad False 7 Tuesday 23
588093 588093 1313930 ad False 15 Tuesday 23
588094 588094 1383070 ad False 1 Tuesday 23
588095 588095 1387658 ad False 2 Tuesday 23
588096 588096 1278437 ad False 1 Tuesday 23
588097 588097 1327975 ad False 1 Tuesday 23
588098 588098 1038442 ad False 3 Tuesday 23
588099 588099 1496395 ad False 1 Tuesday 23
588100 588100 1237779 ad False 1 Tuesday 23